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THE PARALLEL REPLICA METHOD FOR COMPUTING 
EQUILIBRIUM AVERAGES OF MARKOV CHAINS 


DAVID ARISTOFF 


Abstract. An algorithm is proposed for computing equilibrium averages of 
Markov chains which suffer from metastability - the tendency to remain in 
one or more subsets of state space for long time intervals. The algorithm, 
called the parallel replica method (or Par Rep), uses many parallel processors 
to explore these subsets more efficiently. Numerical simulations on a simple 
model demonstrate consistency of the method. A proof of consistency is given 
in an idealized setting. The parallel replica method can be considered a gen¬ 
eralization of A.F. Voter’s parallel replica dynamics, originally developed to 
efficiently simulate metastable Langevin stochastic dynamics. 


1. Introduction 

This article concerns the problem of computing equilibrium averages of time ho¬ 
mogeneous, ergodic Markov chains in the presence of metastability. A Markov chain 
is said to be metastable if it has typically very long sojourn times in certain subsets 
of state space, called metastable sets. A new method, called the parallel replica 
method (or Par Rep), is proposed for efficiently simulating equilibrium averages in 
this setting. 

Markov chains are widely used to model physical systems. In computational 
statistical physics - the main setting for this article - Markov chains are used 
to understand macroscopic properties of matter, starting from a mesoscopic or 
microscopic description. Equilibrium averages then correspond to bulk properties 
of the physical system under consideration, like average density or internal energy. 
A popular class of such models are the Markov State Models ®mm- Markov 
chains also arise as time discretizations of continuous time models like the Langevin 
dynamics [8j, a popular stochastic model for molecular dynamics. For examples of 
Markov chain models not obtained from an underlying continuous time dynamics, 
see for example GOT- It should be emphasized that the discrete in time setting is 
generic - even if the underlying model is continuous in time, what must be simulated 
in practice is a time-discretized version. 

In computational statistical physics, metastability arises from entropic barriers, 
which are bottlenecks in state space, as well as energetic barriers, which are regions 
separating metastable states through which crossings are unlikely (due to, for ex¬ 
ample, high energy saddle points in a potential energy landscape separating the 
states). See Figures 1 2 for simple examples of entropic and energetic barriers. 


Date : 31 December 2014. 

2000 Mathematics Subject Classification. 65C05, 65C20, 65C40, 60J22, 65Y05. 

Key words and phrases. Monte Carlo methods, Markov chain Monte Carlo, metastability, 
parallel computing. 


1 



2 


DAVID ARISTOFF 



Figure 1. A random walk X n on state space 
{—1, —2,..., —100} 2 U {1,2,..., 200} 2 with an entropic bar¬ 
rier. At each step, a direction up, down, left or right is selected at 
random, each with probability 1/4. Then X n moves one unit in 
this direction, provided this does not result in crossing a barrier, 
i.e., one of the edges of the two boxes pictured. The walk can 
cross from the left box to the right box only through the narrow 
pathways indicated. The metastable sets are S = {Si, £ 2 }. 

The method proposed here is closely related to a recently proposed algorithm [Tj, 
also called ParRep, for efficient simulation of metastable Markov chains on a coars¬ 
ened state space. That algorithm can be considered an adaptation of A.F. Voter’s 
parallel replica dynamics m to a discrete time setting. (For a mathematical anal¬ 
ysis of A.F. Voter’s original algorithm, see [7|.) ParRep was shown to be consistent 
with an analysis based on quasistationary distributions (QSDs), or local equilibria 
associated with each metastable set. ParRep uses parallel processing to explore 
phase space more efficiently in real time. A cost of the parallelization is that only 
a coarse version of the Markov chain dynamics, defined on the original state space 
modulo the collection of metastable sets, is obtained. In this article it is shown 
that a simple modification of the ParRep algorithm of III] nonetheless allows for 
computation of equilibrium averages of the original, uncoarsened Markov chain. 

The ParRep algorithm proposed here is very general. It can be applied to any 
Markov chain, and gains in efficiency can be expected when the chain is metastable 
and the metastable sets can be properly identified (either a priori or on the fly). 
In particular, it can be applied to metastable Markov chains with both energetic 
and entropic barriers, and no assumptions about barrier heights, temperature or 
reversibility are required. While there exist many methods for sampling from a 
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Figure 2. A random walk X n on state space {1,2,..., 60} with 
energy barriers. The random walk moves one unit left or right 
according to a biased coin flip: If X n = x and the slope of the 
pictured graph at x is to, then with probability 1/2 + to, X n+ i = 
max{i — 1,1}, and with probability 1/2 — to, X n+ \ = minja; + 
1,60}. The metastablc sets are S = {Si, 52 , S 3 }. 


distribution, most methods, particularly in Markov chain Monte Carlo uni, rely 
on a priori knowledge of relative probabilities of the distribution. In contrast with 
these methods, ParRep does not require any information about the equilibrium 
distribution of the Markov chain. 

The article is organized as follows. Section [2] defines the QSD and notation used 


throughout. Section 
averages (Algorithm 


3 introduces the ParRep algorithm for computing equilibrium 
i 2). In Section |4j consistency of the algorithm is demonstrated 
on the simple models pictured in Figures 1 and 2. A proof of consistency in an 
idealized setting is given in the Appendix. Some concluding remarks are made in 
Section [5] 


2. Notation and the quasistationary distribution 

Throughout, ( X n ) n >o is a time homogeneous Markov chain on a standard Borel 
state space, and P ? is the associated measure when A ' 0 ~ £, where ~ denotes 
equality in law. All sets and functions are assumed measurable without explicit 
mention. The collection of metastable sets will be written S, with elements of S 
denoted by S. Formally, S is simply a set of disjoint subsets of state space. 
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Definition 1 . A probability measure v with support in S is called a quasistationary 
distribution (QSD) if for all n > 0 and all Ac S, 

v{A) = P„ (X n £ A | Xi £ S,... ,X n £ S ). 


That is, if X n ~ v , then conditionally on X n+ i £ S, X ra+ i ~ za It is not hard 
to check that, if for every probability measure £ supported in S and every A C S, 


is (A) = lim I 

n—too 


(X n £ A | X\ £ S, . . . , X n £ S ) , 


( 2 . 1 ) 


then is is the unique QSD in S. Informally, if (2.1) holds, then (X n ) n >o is close 
to is whenever it spends a sufficiently long time in S without leaving. Of course is 
depends on S, but this will not be indicated explicitly. 


3. The ParRep algorithm 


Let ( X n ) n >o be ergodic with equilibrium measure p, and fix a bounded real¬ 
valued function / defined on state space. The output of ParRep is an estimate of 
the average of / with respect to p. The algorithm requires existence of a unique 
QSD in each metastable set, so it is assumed for each S £ S there is a unique 
is satisfying (2.1). This assumption holds under very general mixing conditions; 
see pp. 

The user-chosen parameters of the algorithm are the number of replicas, N; 
the decorrelation and dephasing times, T corr and T p h ase \ and a polling time, T po u. 
The parameters T corr and T p h aS e are closely related to the time needed to reach 
the QSD; both may depend on S £ S. To emphasize this, sometimes T corr (S) or 
Tphase{S) are written. The parameter T po u is a polling time at which the parallel 
replicas resynchronize. See below for further discussion. 


Algorithm 2. Set the simulation clock to zero: T S i m = 0, and set f S im = 0. Then 
iterate the following: 




Decorrelation Step. 

Evolve from time n = T S i m until time n = a, where o is the 

smallest number n > T s , m + T corr — 1 such that there exists S £ S with 
X n £ S, X„_i £ S,... ,X n _ Toorr +1 £ S. Meanwhile, update 

(T 


fsim+ f(X n ). 

n=T sim +1 


Then set T sim = o and proceed to the Dephasing Step, with S now the 
metastable state having X a £ S. 


• Dephasing Step. 

Generate N independent samples, X\,... ,xn, of the QSD is in S. Then 
proceed to the Parallel Step. 


• Parallel Step. 

(i) Set M = 1 and r acc = 0. Let (Xf) n > 0 ,..., (Xff ) n > 0 be replicas of 

(X n ) n >o, that is, Markov chains with the same law as ( X n ) n >o which are 
independent of (X n ) n >o and one another. Set Xg = Xfff = Xn- 

(ii) Evolve all the replicas from time n = (M — l)T po u to time n = A'IT po u. 
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(in) If none of the replicas leave S during this time, update 

N MTp 0 u 

fsim = fsim+Y J E (3- 1 ) 

i=l j=(M-l)T poU +l 

Tacc — 7~acc + NTpoii , 

M = M + 1, 

and return to (ii) above. Otherwise, let K be the smallest number such that 
(X(f) n > 0 leaves S during this time, let t k £ \(M — 1 )T po u + 1, MT po u] be 
the corresponding first exit time, and update 

K -1 MT po ii t k 

fsim=fs t m+J2 E /(*j) + E ^ 

*=1 j=(M-l)T poll +l j=(M—l)T po u+l 

Tacc = T aC c + (K ~ 1 )T po ll + (j K - (M - 1 )T poU ). 

Then update T sim = T sim + r acc , set X Tsim = X acc := Xjf K , and return to 
the Decorrelation Step. 

See Figure 3 for an illustration of the Parallel Step. The key quantity in the 
algorithm is the running average f S im/T S i m , which is an estimate of the average of 
/ with respect to the equilibrium measure p,: 

f‘sim f j. 7 

rp - ~ fdp. 

-L sim J 

Some remarks on Algorithm [2] are in order. 

• The Decorrelation Step. The purpose of the Decorrelation Step is to 
reach the QSD in some metastable set. Indeed, the Decorrelation Step 
terminates exactly when ( X n ) n >o has spent T corr consecutive time steps 
in some metastable set S - so the position of (A„)„> 0 at the end of the 
Decorrelation Step can be considered an approximate sample from v, the 
QSD in S. The error in this approximation is controlled by the parameter 
T corr . Larger values of T corr lead to increased accuracy but lessened effi¬ 
ciency; see the numerical tests in Section [4] below, in particular Figures 4 
and 6. During the Decorrelation Step, the dynamics of (A' n ) n >o is exact, 
so the contribution to f S im from the Decorrelation Step is exact. 

• The Dephasing Step. The Dephasing Step requires sampling N iid copies 
of the QSD in S, where S is the metastable set from the end of the Decor¬ 
relation Step. The practitioner has flexibility in sampling these points. 
Essentially, one has to sample N endpoints of trajectories of {X n ) n >o that 
have remained in S for a long enough time, with this time being controlled 
by the parameter T p hase- For example, the Dephasing Step can be done 
with rejection sampling, keeping trajectories which have remained in S for 
time Tphase■ Alternatively, the QSD samples may be obtained via tech¬ 
niques related to the Fleming-Viot process; for details see [J and [2|. This 
technique can be summarized as follows: N replicas of (A' n )n> 0 : all start¬ 
ing in S, are independently evolved until one or several leave S ; then each 
replica which left S is restarted from the current position of another replica 
still inside S, chosen uniformly at random. After time T p h aS e this procedure 
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Tpoii ZZ> ... (M - 1) T poll MTpou 



time 


0 


NTpou 2NT po u 


(M — 1 )NT po u T acc 


T ac c = sum of lengths of bold lines = (M - 1 )NT poU + (K - 1)2]^; + [r A - (M - 1 )T poU ] 
Xacc = position of (2f^) n >o at the cross (time t k ) 


Figure 3. Visualization of the Parallel Step of Algorithm [2] The 
crosses represent exits from S. After M loops internal to the Paral¬ 
lel Step, two of the replicas leave S, with (X^-) n > 0 , the one among 
these having the smallest index I\, leaving at time t k . The tra¬ 
jectories of all the replicas can be concatenated into a single long 
trajectory of length r acc . This single long trajectory is obtained by 
running through the columns of width T po a from top to bottom, 
starting at the far left, in the order 1, 2,..., N + 1, N + 2,... indi¬ 
cated. The time marginals of this long trajectory (except its right 
endpoint) are all distributed according to the QSD in S. 


stops and the current positions of the replicas are used as the N required 
samples of v. 

Under mild mixing conditions, convergence to the QSD is very fast. 
More precisely, the limit in the right hand side of (2.1) converges to v 
geometrically fast in total variation norm pQ. An analysis of the error 
associated with not exactly reaching the QSD will be the focus of another 
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work. For an analysis of the error associated with not reaching the QSD 
in the original continuous-in-time version of the algorithm, see [13] . In the 
metastable setting considered here, the average time to (approximately) 
reach the QSD in S is assumed much smaller than the average time, starting 
at the QSD, to leave S. Indeed, this assumption can be considered the very 
definition of metastability. Gains in efficiency in ParRep are limited by the 
degree of metastability; see [2j and the discussion in Section [4] below. 

It is emphasized that / S j m and T S i m are left unchanged during the De¬ 
phasing Step. Contributions to f S i m and T S i m come only from the Decor¬ 
relation and Parallel Steps. 

The Parallel Step. The purpose of the Parallel Step is twofold. First, 
it simulates an exit event from S', the metastable set from the end of the 
Decorrelation Step, starting from the QSD in S. This is consistent with 
the exit event that would have been observed if, in the Decorrelation Step, 
(X n ) n > 0 had been allowed to continue evolving until leaving S: 

Theorem 3. (Proposition 4.5 of [Jj.) Suppose the QSD sampling in the De¬ 
phasing Step of Algorithm^is exact. Then in the Parallel Step, (r acc , X acc ) ~ 
(t,X t ), where Xq ~ v, with v the QSD in S and t := min{n > 0 : X n ef 
S}. 

The gain in efficiency in ParRep, compared to direct serial simulation, 
comes from the use of parallel processing in the Parallel Step. The wall- 
clock time speedup the ratio of average serial simulation time to the 
ParRep simulation time of the exit event - scales like N, though the gain 
in efficiency in ParRep as a whole depends also on T corr , T p hase and the 
degree of metastability of the sets in S. 

Second, the Parallel Step includes a contribution to f S im- As the fine 
scale dynamics of (X„)„>o in S are not retained in the Parallel step, this 
contribution is not exact. It is, however, consistent on average , which is 
sufficient for the computation of equilibrium averages. This can be under¬ 
stood as follows. Concatenate all the trajectories of all the replicas into a 
single long trajectory by following the procedure indicated in Figure 3. The 
resulting trajectory has a probability law that is of course different from 
that of (J„)„>o starting from the QSD v in S. However, in light of Defi¬ 
nition [lj the time marginals of this trajectory (except the right endpoint) 
are all distributed according to v. Moreover, from Theorem [3j the total 
length of this concatenated trajectory has the same law as that of (X ra )„>o 
started from the QSD in S and stopped at the first exit time from S. So 
by linearity of expectation, the contribution to f S im from the Parallel Step 
is consistent on average. See the Appendix for proofs of these statements 
in an idealized setting. 

Other remarks. The parameter T po u is a polling time at which the pos¬ 
sibly asynchronous parallel processors in the Parallel Step resynchronize. 
For the Parallel Step to be finished correctly, one has to wait until the 
first I\ processors have completed MT po u time steps. If the processors are 
nearly synchronous or communication between them is cheap, one can take 
Tpou = 1 . 
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The metastable sets S need not be known a priori. In many applications, 
they can be identified on the fly; for example, when the metastable sets are 
the basins of attraction of a potential energy, they can be found efficiently 
on the fly by gradient descent. The reader is referred to [2| as well as QQ and 
references therein for examples of successful applications of related versions 
of ParRep in this setting. 


4. Numerical tests 


4.1. Example 1: Entropic barrier. Consider the Markov chain from Figure 1 
on state space {—1, —2,..., —100} 2 U {1,2,..., 200} 2 . The Markov chain evolves 
according to a random walk: at each time step it moves one unit up, down, left 
or right each with probability 1/4, provided the result is inside state space; if not, 
the move is rejected and the position stays the same. There is one exception: If 
the current position is (—1,1) or (—1,100) and a move to the right is proposed, 
then the next position is (1,1) or (1,100), respectively; and if the current position 
is (1,1) or (1,100) and a move to the left is proposed, then the next position is 
(— 1,1) or (—1,100), respectively. This Markov chain is ergodic with respect to the 
uniform distribution p uni f on state space. 

ParRep was performed on this system with Si := {—1, —2,..., —100} 2 , S- 2 := 
{1,2,..., 200} 2 , and S = {Si, S 2 }- Parameters were always chosen so that T corr = 
Tphase and T corr (S 2 ) = 4T corr (Si), and QSD samples from the Dephasing Step were 
obtained using the Fleming-Viot-based technique described above. 

With N = 100 replicas and various values of T corr (Si), ParRep was used to 
obtain average x- and y-coordinates with respect to p U nif as well as the p U nif~ 
probability to be in the upper half of the right hand side box, denoted by: 



Here 1^ denotes the indicator function of A. See Figure 4. Also computed was 
the average time speedup: namely, T sim divided by the “wall clock time,” defined 
as follows. Like T S i m , the wall clock time stars at zero. It increases by 1 during 
each time step of (A' n )n>o in the Decorrelation Step (consistent with T s , m ), while 
it increases by MT po u in the Parallel Step (unlike T S i m , which increases by r QCC ). 
The wall clock time also increases by T p h a se during the dephasing step (where T S i m 
does not increase at all). Informally, the wall clock time corresponds to true clock 
time in an idealized setting where all the processors always compute one time step 
of (X n ) n >o in exactly 1 unit of time, and communication between processors takes 
zero time. As T corr increases, the time speedup decreases, but accuracy increases. 

Figure 5 shows the dependence of time speedup on the number of replicas, N , 
when T corr (Si) = 6000. To illuminate the dependence of time speedup on TV, 
Figure 5 also includes the average (total) number of decorrelation steps, parallel 
steps, and parallel loops (i.e., loops internal to the parallel step in the notation 
of Algorithm|2j there are M loops internal to the parallel step). As N increases, 
the number of parallel loops decreases sharply, while the number of parallel steps 
and decorrelation steps remain nearly constant. Thus, with increasing N the wall 
clock time spent in the parallel step falls quickly. The time speedup, however, is 


PARREP FOR MARKOV CHAINS 


9 






Figure 4. Equilibrium average values and (average) time speedup 
vs. T corr (S i) in ParRep simulations of the Markov chain from 
Example 1, with N = 100. The straight lines correspond to exact 
values. ParRep simulations were stopped when T sim first exceeded 
5 x 10 9 , and error bars are standard deviations obtained from 100 
independent trials. 


limited by the wall clock time spent in the decorrelation step, and so it levels off 
with increasing N. The value of N at which this leveling off occurs depends on the 
degree of metastability in the problem, or slightly more precisely, the ratios, over 
all S £ S, of the time scale for leaving S to the time scale for reaching the QSD 
in S. In the limit as this ratio approaches infinity, the time speedup grows like N. 
See [2] for a discussion of this issue in a continuous time version of ParRep. 

4.2. Example 2: Energetic barrier. Consider the Markov chain from Figure 2 
on state space {1,..., 60}. The Markov chain evolves according to a biased random 









10 


DAVID ARISTOFF 




N N 




Figure 5. Average time speedup factor, number of decorrelation 
steps, number of parallel steps, and number of parallel loops in 
Par Rep simulations of the Markov chain from Example 1, with 
T CO rr(S l) = 6000. ParRep simulations were stopped when T S i m 
first exceeded 10 8 , and error bars are standard deviations obtained 
from 100 independent trials. 


walk: If X n = x, then with probability p x , X n+ i = max{l,x — 1}, while with 
probability 1 — p x , A' n +i = min{60,x + 1}. Here, 


0 .6, 

x £ {l,... 

,15} 

0.4, 

x £ {16,.. 

..,30} 

0.65, 

x £ {31,.. 

..,45} 

0.35, 

x £ {46,.. 

..,60} 


The equilibrium distribution pbias of this Markov chain can be explicitly computed. 
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Figure 6. Equilibrium average values and (average) time speedup 
vs. T corr {Sj,) in ParRep simulations of the Markov chain from Ex¬ 
ample 2, with N = 100. The straight lines correspond to exact 
values. ParRep simulations were stopped when T sim first exceeded 
10 9 , and error bars are standard deviations obtained from 100 in¬ 
dependent trials. For the smallest value of T corr (S-i), the Markov 
chain is typically close to the edges of Si, S 2 or S 3 , which results 
in shorter parallel steps and thus a smaller time speedup. 


ParRep simulations were performed on this system with Si := {1,..., 15}, S 2 '■= 
{16,..., 45}, S3 = {46, ...,60} and S = {S}, 52 , S3}. Parameters were always 
chosen so that T corr — Tp^ase and T corr [Si} — ^-corr (bb) — tj T corr (S 3 ), and ()S11 
samples from the Dephasing Step were again obtained using the Fleming-Viot-based 
technique. 

With N = 100 replicas and various values of T corr (S 3 ), ParRep was used to 
obtain the average ^-coordinate with respect to Ubias as we ll as the !Was-pi'obability 
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Figure 7. Average time speedup factor, number of decorrelation 
steps, number of parallel steps, and number of parallel loops in 
ParRep simulations of the Markov chain from Example 2, when 
T CO rr{S 3 ) = 60. ParRep simulations were stopped when T S i m first 
exceeded 10 s , and error bars are standard deviations obtained from 
100 independent trials. 


to be in the right half of the interval, denoted by: 


(x) 



ias 1 



1x6 [31,60] ias • 


Also computed was the time speedup, defined exactly as above. Simulations were 
stopped when T sim first exceeded 10 9 . See Figure 6. Again, accuracy increases 
with T corr , but the time speedup decreases with T corr . 
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Figure 7 shows the dependence of time speedup on the number of replicas, TV, 
when T corr (S 3 ) = 60. Also plotted are the average number of decorrelation steps, 
parallel steps, and parallel loops. The results are similar to Example 1, though the 
degree of metastability and time speedup are much larger. 

5. Conclusion 

A new algorithm, ParRep, for computing equilibrium averages of Markov chains 
is presented. The algorithm requires no knowledge about the equilibrium distribu¬ 
tion of the Markov chain. Gains in efficiency are obtained by asynchronous parallel 
processing. For these gains to be achievable in practice, the Markov chain must pos¬ 
sess some metastable sets. These sets need not be known a priori, but they should 
be identifiable on the fly; for example, in many applications in computational chem¬ 
istry, the metastable sets can be basins of attraction of a potential energy, identified 
on the fly by gradient descent. When metastable sets are present, the gains in ef¬ 
ficiency are limited by the degree of metastability. See [1 for a discussion and an 
application of a related version of ParRep in this setting. 

Applications in computational chemistry seem numerous. Nearly all popular sto¬ 
chastic models of molecular dynamics are Markovian. Even when these models are 
continuous in time, to actually simulate the models a time discretization is required 
and the result is a Markov chain. Generically, these models have many metastable 
sets associated with different geometric arrangements of atoms at distinct local 
minima of the potential energy or free energy. Many times the equilibrium distri¬ 
butions of these models are unknown - for example if external forces are present 
- yet it is still of great interest to sample equilibrium. Because of metastability 
it is often impractical or impossible to sample equilibrium with direct simulation. 
ParRep may put such computations within reach. 

Appendix 

In this Appendix, consistency of ParRep is proved in an idealized setting. 

5.1. Idealized setting, assumptions, and main result. Recall (X n ) n >o is a 
Markov chain on a standard Borel state space (Cl, F). The collect S C T of disjoint 
sets is assumed finite, with a unique QSD v associated to each metastable set S. All 
probabilities, which may be associated to different spaces and random processes or 
variables, will be denoted by P; the meaning will be clear from context. Probabilities 
associated with the initial distribution £ are denoted by Pj. (If f = 5 X then P x is 
written instead.) The corresponding expectations are written E, E^, or E^,. The 
norm || • j| will always be total variation norm. 

In all the analysis below, an idealized setting is assumed. It is defined by two 
conditions: the QSD is sampled exactly in the Dephasing Step (Idealization [4]), and 
the QSD is reached exactly by time T corr (Idealization [5]). These are idealizing 
assumptions in the sense that, in practice, the QSD is never exactly reached. 

Idealization 4. In the Dephasing Step, the points X\,... ,Xn are drawn indepen¬ 
dently and exactly from the QSD v in S. 

Idealization 5. For each S £ S there is a time T corr > 0 such that, after spending 
T corr consecutive time steps in S, the Markov chain (X n ) n > 0 is exactly distributed 
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according to v. That is, for every S £ S and every A E T with A C S, 

P(^T cori . € A | X\ £ S,, Xx corr £ S) = v(A). 

In practice, the word exactly must be replaced with approximately. The error 
associated with not exactly reaching the QSD will not be studied here. See how¬ 
ever m for an analysis of this error in the continuous in time setting. Here, the 
idealized setting seems necessary to connect the Par Rep dynamics with those of the 
original Markov chain. The idealizations allow these two dynamics to be synchro¬ 
nized after reaching the QSD, which is crucial in the analysis below. In particular, 
the analysis here cannot be modified in a simple way to allow for inexact conver¬ 
gence to the QSD. 

By Idealization [5j at the end of each Decorrelation Step (A ' n ) n > 0 is distributed 
exactly according to the QSD. By Idealization |4j the Parallel Step is exact: 

Theorem 6. (Restated from jT ].) Let Idealization^ hold. Then in the Parallel 
Step of Algorithm^ ( T acc ,X acc ) ~ (t,X t ), where Xo ~ v, with v the QSD in S 
and t := min{n > 0 : X n (f S}. Moreover, t is a geometric random variable with 
parameter p := P„(Ai ^ S), and r is independent of X T . 

In particular, the first exit time from S, starting at the QSD, is a geometric 
random variable which is independent of the exit position. This property is crucial 
for proving consistency of the Parallel Step (see HI), and will be useful below. 

To prove the main result, a form of ergodicity for the original Markov chain is 
required: 

Assumption 7. The Markov chain ( X n ) n >o Is uniformly ergodic: that is, there 
exists a (unique) probability measure p on (Jl, F) such that 

lim sup ||P{(A n e •) - p\\ = 0 . 

n—> 00 £ 

where the supremum is taken over all probability measures l; on (f l,T). 

Next, a Doeblin-like condition is assumed: 

Assumption 8. There exists a £ (0,1), S £ S with p(S) > 0, and a probability 
measure A on (Q, F) supported in S such that the following holds: for all x £ S 
and all C £ T with C C S , 

P,(A'i £C)> aX (C). 

Finally, a lower bound is assumed for escape rates from metastable states. 
Assumption 9. There exists 6 > 0 such that for all S £ S, 

P„(A! i S) > 6 . 


This simply says that none of the metastable sets are absorbing. The following 
is the main result of this Appendix: 

Theorem 10. Let Idealizations EMU and Assumptions hold. Then for any 
probability measure f on and any bounded measurable function f : fl —> R: 


lim 

Tsim—tO O 


T 



= 1 . 
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The proof of Theorem [10| is in Section [572] below. It is emphasized that Idealiza¬ 
tions H0 and Assumptions^ [£] are assumed to hold throughout the remainder of 
the Appendix. Furthermore, for simplicity it is assumed that T corr is the same for 
each S £ S. 


5.2. Proof of main result. The first step in the proof is to show that Theorem [lC)| 
holds when the number of replicas is N = 1 (Sections |5.2.1 - 5.2.5). Then this will 
be generalized to any number of replicas (Section |5.2.6[ ). It is known (see Chapter 7 
of |U) that Assumption [7] is a sufficient condition for the following to hold: 


Lemma 11. There exists a (unique) measure p on (12, .F) such that for all proba¬ 
bility measures £ on (12, F") and all bounded measurable functions / : 12 -A ]R, 


lim 

n —>00 


f(X 0 ) + ... + f(X n _ 1 ) 



= 1 . 


5.2.1. The ParRep process with one replica. Consider a stochastic process (X n )n>o 
which represents the underlying process in Algorithm[2]when the number of replicas 
is N = 1. Loosely speaking, (X n ) n > 0 evolves like (A„)„>o in the Decorrelation 
Step, and like (A^) n > 0 in the Parallel Step (and it does not evolve during the 
Dephasing Step). More precisely, (A „) n > 0 can be defined in the following way 
(writing S for a generic element of S): 

1. If X n = x and x £ S do the following. If Xj ^ S for some j £ {n — 
l,n — 2,... ,max{0,n — T corr + l}}, pick x' from P^A'i £ •), let X n+ i = x' , 
update n = n + 1, and repeat. Otherwise, update n = n + 1 and proceed 
to 2 . 

2. If X n = x and x £ S, pick z from the QSD in S , pick x' from P-(Ai £ •), 
and let A n+ i = x'. If x' S , update n = n +1 and return to 1. Otherwise, 
update n = n + 1 and proceed to 3. 

3. If X n = x and x £ S, pick x’ from P x {X\ £ •), and let X n+ i = x'. If 
x' £ 5, update n = n + 1 and repeat. Otherwise, update n = n + 1 and 
return to 1 . 

Note that ( X n ) n > 0 is not Markovian, since the next value of the process depends 
on the history of the process. Idealization [HJ however, implies that X n and X n have 
the same law for each n > 0 : 


Lemma 12. If Xq ~ Xq, then for every n > 0, X n ~ X n . 


5.2.2. An extended Markovian process. Consider next an extended Markovian pro¬ 
cess (Y n ) n > 0 with values in 12 x Z, such that ( 7 ri(y„)) n >o has the same law as 
(A„)„>o, where n.i : 12 x Z —> 12 is projection onto the zth component: 

7Ti(x, t) = x, 712 ( 2 ;, 2) =2. 

Loosely speaking, the second component of (F „) n > 0 is a counter indicating how 
many consecutive steps the process has spent in a given state S £ S. The counter 
stops at T corr , even if it continues to survive in S. The first component of (Y n ) n > 0 
evolves exactly like (A n ) n >o, except when the second component is T corr — 1, in 
which case, starting at a sample of the QSD in S , the process is evolved one time 
step. It is convenient to describe (T n ) n >0 more precisely as follows (writing S for 
a generic element of 5): 
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1. If Y n = (y, t) with y £ S and 0 <t< T corr — 1, pick y' from P y (X\ £ ■). If 

y' £ S, let Y n+ 1 = (y',t + 1); otherwise let Y n+1 = (y', 0). 

2. If Y n = ( y,T corr — 1) with y £ S, pick z from the QSD v in S , and 

pick y' from P z (Xi £ •). If y' £ S, let Y n+ 1 = (y',T corr ); otherwise let 

Y n +i = (y', 0). 

3. If Y n = ( y,T corr ) with y £ S, pick y' from P y {X\ £ •). If y' £ S, let 
Y n+ 1 = (y 1 , Tear,.); otherwise let Y n+1 = (y 1 , 0). 

The process (Y n ) n >o is Markovian on state space (Ply, Ty), where 

Ply = Pi X Z, .Fy = J~ ® 2^. 

The following result is immediate from construction. 

Lemma 13. Ifm(Y 0 ) ~ X 0 and 7r 2 (F 0 ) = 0, f/ien (7Ti(F„)) n > 0 ~ (^«)n>o- 

Note that for the processes to have the same law, the counter of the extended 
process must start at zero. Lemmas [12} [j~3| give the following relationship between 
the extended process and the original Markov chain: 


Lemma 14. If ni(Y 0 ) ~ X 0 and 7t 2 (1o) = 0, then for every n > 0, (Y n ) ~ X n . 


5.2.3. Harris Chains. Let ( Z n ) n > 0 be a Markov chain on a standard Borel state 
space (S, £). The process (Z n ) n > o is a Harris chain if there exists e > 0, A, B £ £, 
and a probability measure p on (E,£) supported in B such that 

(i) For all x £ E, we have P x (inf{n > 0 : Z n £ A} < oo) > 0; 

(ii) For all x £ A and C £ £ with C C B, we have P x (Zi £ C) > ep(C). 

See for instance Chapter 5 of [B]. Intuitively, starting at any point in A, with 
probability at least e, the process is distributed according to p after one time step. 
This allows ergodicity of the chain to be studied using ideas similar to the case 
where the state space is discrete. The trick is to consider an auxiliary process 
(Z n ) n >o with values in E := E U {cr}, where a corresponds to being distributed 
according to p on B. More precisely: 

1. If Z n = x and x £ E \ A, pick y from P x (Zi £ •) and let Z n+ \ = y. 

2. If Z n = x and x £ A: with probability e, let Z n+ 1 = tr; with probability 
1 - e, pick y from (1 - e)~ 1 (P x {Z 1 £ •) - ep(-)) and let Z n+1 = y. 

3. If = <r, pick x from p(-). Then pick Z n+ 1 as in 1-2. 

So (Z n ) n > o is Markov on (E, £), where £ consists of sets of the form C and CU {cr} 
for C £ £. The following result (see [B]) relates the auxiliary process to the original 
process. 


Lemma 15. Let f : E 


be bounded and measurable, and define f : E 
f(x), x £ E 

f B f d P . x = ° 


R by 


fix) = 


Then for any probability measure £ on (E,£) and any n > 0, 

E i [f(Z n )]=E i [f(Z n )\, 

where £ is the probability measure on (E, £) defined by £(A) = £(A) for A £ £, and 

CM = o. 


The following theorem gives sufficient conditions for the Harris chain to be er- 
godic. Note that the conditions are in terms of the auxiliary chain. 
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Lemma 16. Let (Z n ) n > o be a Harris chain on (E,£) with auxiliary chain (Z n ) n > o- 
Assume that 

OO 

J^P a{Z n = a) = oo 

n—1 


and 

g.c.d.{?i > 0 : P „(Z n = a) > 0} = 1. 


Then there exists a (unique) measure ij on (E,£) such that for any probability 
measure £ on (E,£) and any bounded measurable function f : E —► K, 



f(Z 0 ) + ... + f(Z n _ 1 ) 
n 



= 1 . 


Moreover, for any probability measure f on (E,£), 


lim ||P^(Z n G •) - r?|| = 0. 

n—too 


Proof of Lemma 16 can be found in Chapter 5 of [ 6 ] and Chapter 7 of [5]. 


5.2.4. Ergodicity of the extended process. In Theorem [20] below, ergodicity of the 
extended process (Y n ) n > o is proved. Before proceeding, three preliminary results, 
Lemmas [TT} [l9| below, are required. Define 


= infjro > 0 : TT 2 (Y n ) = 0}. 


(5.1) 


From Lemma 14 r can be thought of as the first time n at which the law of 7 r(Y n ) 
synchronizes with that of X n . 


Lemma 17. For any probability measure f on (Dy, JV) and any t > T corr + 1, 

P|(r <t)>5. 

Proof. Let t > T corr + 1 and define 

re = inf{n > 0 : 7 T 2 (Y n ) = T corr }. 

Note that if r > re, then re < T corr and so re + 1 < t. On the other hand, if t < re, 
then t < T corr and so r < f - 1 < t. Thus, 

P|(r < t) = P^(t < 1 1 r < re)P^(r < re) + P{(t < t | r > re)P^(r > re) 

= P^(r < re) + Pj(r < 1 1 r > re)Pc(r > re) (5.2) 

> Pj(r < re) + P^(r = re + 11 r > re)P^(r > re). 

By Assumption | 8 j for any (x,t) G fly, 

Tcorr- 

P(a; it ) (r = re + l|r>re)= P( X;t ) (t = k + 11 re = k, r> fc)P(re = k | r > re) 

fc=0 

Tcorr 

= P„(Xi i 5)P(re = k I T > re) 

k -0 

= P,(X! i S) > 5, 


(5.3) 
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where v is the QSD in 5, with 5 3 x. Combining (5.2 1 and (5.3) and using the fact 
that S £ (0,1], 


P^(r < t) > Pj(t < k) + (5P^(r > k) 
= 6 + (1 — 5)P^(r < k ) 

> 6 . 


□ 


For the remainder of Section [5.2.4[ hx S £ S satisfying Assumption [Sj and define 

S Y = Sx {T corr }. (5.4) 


Lemma 18. Let f be any probability measure on (f 1y,!Fy) with support in S x 
{0,..., T corr }. Then for all n > T corr , 

P d Y n G Sy) > a n . 

Proof. By Assumption [8j P a .(A'i £ S) > a whenever x £ S. By definition of 
the extended process (Y n ) n > 0 , the following holds. First, for any x £ S and any 
t £ {0,..., T corr — 2}, 

P(x,t)(i r i ^x{i + l}) = P,(Ab £ S) > a, (5.5) 

Second, for any x £ S, 

P(a:,T eorr -i)(^i G Sy) = [ P y (Ai G S) v{dy) > [ a v(dy) = a (5.6) 
Js Js 

Third, for any x £ S, 


*(x,r oorr )(Yi G S Y ) = Px(Ab £ S) > a. 


(5.7) 


Let n > T corr . For any x £ S and t £ {0,... ,T corr }, due to (5.51, (5.6) and (5.7), 

Tcorr ^ 

cY < ]^[ P(®,t) {Yj G S x {t + j} | Yj_i £ S x {t + j — 1}) 

3=1 

n 

x n P (x>t) (Yj £ S Y | Yj_ 1 € Sy) 

3 — T CO rr' -t +1 

l Tcorr t 71 

= P (x , t) f) {Yj £ S x {* + j}}, n i Y i e S r} 


3=1 

< I ^(x,t)(Y n £ Sy), 


3 — T C orr " -t +1 


where by convention the product and intersection from j = T corr — t + 1 to j = n 
do not appear above if n = T corr and t = 0. □ 


Lemmas 17 ■ 18 lead to the following. 


Lemma 19. There exists N > 0 and c > 0 such that for all probability measures f 
on (Tty, JFy) and all n > N, 


IP’fO'n G Sy) > c. 
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Proof. Fix a probability measure £ on (fiy, Ty)- Since p(S) > 0, by Assumption [7] 
one may choose N" > 0 and d > 0 such that for all probability measures £ on 
(fl, J-) and all n > N", 

P C (*n £S)> d. (5.8) 


Let N' = N" + T corr + 1 and define r as in (5.1). 
measures £,■ on (fi, JF) by, for A £ T, 


For j > 0, define probability 


f ] (A) = P i (n 1 (Y J )GA\T=j). 


By Lemma 14 and (5.8), for all j G {0,. .. ,T corr + 1} and n > N', 

P?(7Tl(Fn) <E S\t = j) = [ Pc(7ri(y n ) € S I -K^Yj) =X,T= j) £j(dx) 

Jn 

= [ P(x,0)(7Tl (Y n -j) € S)£j(dx) 

J n 

= [ P x (X n _j € S)£j(dx) 

J o 


= Pt. (X n _j £ S)>d. 


So by Lemma 17 for all n> N ', 

Tcorr +i 

P«MY„)gS)> £ P S ( 7 r 1 (F„)e5|r = i)Pj(r = i) 

3 =0 

> c'p’ e (r < T corr + 1) 

> c' 6 . 


(5.9) 


Let N = N 1 + T corr and fixn>lV. Define a probability measure <j> n on (Qy, J~y) 
with support in S' x {0, ..., T corr } by, for A £ J 7 and t £ {0, ..., T corr }, 


4> n (A,t ) = P 5 (y n _ Tcorr G A x {1} 17ri(y„_ Tcorr ) G S) 


By Lemma 18 and (|5.9|) , 


P«(K G Sy) > P*(F„ G Sy | 7T! (V n _T corr ) G S) P^^n-T^) G S) 

= P 0 „(^r COPr G Sy)P i (n 1 (Y n . Tcorr ) £ S ) 

— P?i„(pT corr G Sy)c' 5 
> a Tcorr dS. 

Taking c = a Tcorr dS completes the proof. □ 


Finally ergodicity of the extended process (Y„)„>o can be proved, using the tools 
of Section [5.2.31 


Theorem 20. There exists a (unique) measure fly on (f2y,J r y) such that for any 
probability measure £ on (f2y, J-y) and any bounded measurable function f : fly —► 


Pe lim 


f(Yo) + • • • + f(Y n -i) 


= / fdpY = 1 - 


Moreover, for any probability measure £ on (f2y, J-y), 


lim ||P 5 (Y„ G ■) - hy\\ = 0. 

n—>■ oo 
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Proof. First, it is claimed (F n )n>o is a Harris chain. Recall that S and Sy are 
defined as in (5.4). Lemma [l9| shows that for any (x,t) £ fly, 


I ( x ,t) (inf{n > 0 : Y n £ Sy} < oo) > 0. 


(5.10) 


Define a probability measure p on (Sly, J~y) with support in Sy by: for A £ J- and 
t £ {0,..., T corr }, 


p(A,t) 


A(A), t = T corr 
0 , else 


Let C £ Ty with C C Sy. Then C = A x { T corr } with A £ ?, A C S. From 
Assumption [8] for any (x,t) £ Sy , 


P ( *,t)(ii e C) = P,.(A! £ A) > a\(A) = ap{C). (5.11) 


One can check (K„) n > 0 is a Harris chain by taking A = B = Sy , e = a, and p as 
above in the definition of Harris chains in Section 15.2.31 

Next it is proved that (Y n ) n > Q is ergodic. Let (Y n ) n > 0 be the auxiliary chain 
defined as in Section 15.2.31 Note that 


P CT (Yi = a) = a. 


This shows the second assumption of Lemma [16] holds, that is, 


g.c.d.{n > 0 : P a {Y n = a) > 0} = 1, 


since 1 is in the set. Consider now the first assumption. It must be shown that 

OO 

^P ct (T„ = ( t)=cx). (5.12) 

n—1 

By Lemma [l9j one can choose N > 0 and c > 0 such for all probability measures 
£ on (f2y, Fy) and all n > N , 


P«(K e S Y ) > c. 

Define a probability measure f on (fly, J~y) by 

€(A) = P, (?i £ A | ^ a) for A £ Ty , = 0, 


(5.13) 


and let £ be the probability measure on (fly, J~y) which is the restriction of £ to 
fly. By (5.13) and Lemma 15 with / = lg y , for all n> N, 


C<Pd Y n&Sy) 

= P^(? n £ Sy) + Pj(? n = cr) 
= Pf(?„ £ Sy U {cr}). 


Using (5.14), for n > N + 1, 


p AYu £ Sy U M) > P a (Y n £ Sy U M I ?! yf a)P CT (? + a) 
= (1 - a)P CT (? n e5yU {cr} I ?! yf cr) 

= (1 - a)Pj(? n _i £ Sy U {cr}) 

> (1 — a)c. 


(5.14) 


(5.15) 
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Now by ( |5. 15 ), for n > N + 2, 

P a(Y„ = cr) > P a {Y n = a I Y n -1 e5rU {o-})P CT (Y n _i e5 r U {<r}) 
> P a(Xn = <7 | Yn -1 G Sy U {fj})(l - a)c 
= a(l — a)c > 0. 


Thus (5.12) holds. The result now follows from Lemma 16 


□ 


5.2.5. Ergodicity of the ParRep process with one replica. Next, ergodicity of (X n )n>o> 
the ParRep process with one replica, is proved. 

Theorem 21. For all probability measures £ on (Q,J-) and all bounded measurable 
functions f : fl —> R, 

^n->oo n Jn J 

Proof. Fix a probability measure £ on (fl, J-) and a bounded measurable function 
/ : fl —> K. Define fy '■ fly —>■ R. by 

fr = f ° TTi, 

and define a probability measure £y on (fly, Ty) by, for A G T and t £ {0,..., T corr }, 




£(T), t = 0 
0, tG {I,-.., T corr } 


By Theorem 20 there exists a (unique) measure py on (fly, Ty) such that 

fy(Y 0 ) + ■ ■ ■ + fy{Y n - i) 


Pjy ( lim 


fy dpy = 1 


and 


lim ||Pf r (y„ e •) - Hy\\ = 0. 

n—>oo 

Define a measure p! on (ST, T) by, for A G IF, 

T 

L corr 

f'( a ) = Y Fy{A,t). 


(5.16) 

(5.17) 

(5.18) 


t =0 


From this and the definition of fy, 


[ fydpy = f f dp!. 
J Qy' J Q 


So by Lemma 13 and (5.17|), 


1 = P^ v ( lim 


fy(Y 0 ) + ... + fy(Y n _ i) 


fy dpy 


= P £ lim 


f(Xo) + ■ • ■ + f(X n - i) 




(5.19) 
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Also, by Lemma 14 and (5.18), 


0 = lim sup |P 5> . (Y n G A x {0,..., T corr }) - h y (A x {0,..., T corr })\ 

n->00 AeJ r 

= lim sup |Pc (X n £ A) — ijl'(A) \ 

n-S-OO A ^JT 

= lim ||Pc(A n £■)- T* II- 

n—>oo 

Using Assumption [?] one can conclude ^ = //. So from (5.19), 


lim /(X„) + ... + /(X,_,) = r \ =L 


□ 


5.2.6. Proof of main result. Here the main result, Theorem |10[ is finally proved. 
The idea is to use ergodicity of (X n ) n >o along with the fact that the average value 
of the contribution to f S i m from a Parallel Step of Algorithm [2] does not depend 
on the number of replicas. A law of large numbers applied to the contributions to 
f S i m from all the Parallel Steps will then be enough to conclude. Note that the 
law of (fsim)T sim > o depends on the number N of replicas, but this is not indicated 
explicitly. 


Proof of Theorem \ 1 Fix a probability measure £ on (fl, J 7 ) and a bounded mea¬ 
surable function / : fl —> M. Define / : fly —> M by 


f(x, t ) 


f(x), t £ {0,..., T c , 
0 , t — T corr 


1 } 


Let Algorithm[2]start at £. The quantity f S i m will be decomposed into contributions 
from the Decorrelation Step and the Parallel Step. Let ff°£f denote the contribution 
to f s im from the Decorrelation Step up to time T S i m , and let denote the 

contribution to f 3 im from the Parallel Step up to time T S i m . Thus, 


{fsim^Tsijny 0 — (fi 


corr 

sim 


rpar \ 

' J sxmJl 


^>0 ' 


(5.20) 


Let (Y n ) n >o start at Y 0 ~ £y, with fy defined as in (5.16). Because the starting 


points x\,...,Xn sampled in the Dephasing Step are independent of the history of 
algorithm, each Parallel Step - in particular the pair (t qcc , Xff K ) - is independent 
of the history of the algorithm. This and Theorem [ 6 ] imply that (fsim)T corr > o Fas 
the same law for every number of replicas N. In particular when iV = 1, from 
Lemma |13[ 

f T corr \ 

U C sim)T sim > 0 ~ £ f(Yi) . (5.21) 


\ i=0 / T sim >0 

Meanwhile, from the preceding independence argument, 

/ n S,T sirn \ 

fP ar \ _ /i(*) 


•,N 


(5.22) 


(JS)wo~ E £ 

\5G«S i—1 J T s im >0 

where {0 « jv}*=i>2 ,... are hd random variables and ns t T aim counts the number of 
sojourns of (Y„) n >o hr S x {T corr } by time T sim : 

n s,T sim = #{1 <n< T sirn : Y n £ S x {T corr }, Y n _ x £ S x {T corr - 1}}. 
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From Idealization [4] and Definition [Tl each term in the sum in ( |3.1[ ) or (3.2) of 
the Parallel Step has expected value / dv. So from linearity of expectation and 
Theorems [6j for any number N of replicas, 


E^] = (E[r occ ] - 1) [ fdv 

Js 

= i S)- 1 - 1) f fdv. 

Js 


(5.23) 


Combining (5.20), (5.21) and (5.221, for any number N of replicas, 

n S,T s im 


fs 


1 


sim / Tgi m >0 


T 

S') 


£ /«> 


i 


i=0 


E E 

ses j=o 



(5.24) 


,>o 


where it is assumed the processes on the left and right hand side of (5.24) are 
independent. Let (X n ) n >o start at Xq ~ £. From definition of (X n ) n >o and (5.24), 
when the number of replicas is N = 1, 


fs-. 


T s , 


,>o 


f f(X 0 ) + ... + f(X Tstm ) \ 
\ Tsim I 


T si m> 0 


(5.25) 


T m 


E/ra + EE E E 


i =0 


T, 


ses 3=0 


,>o 


where the processes in (5.25) are assumed independent. Since (Y n ) n > 0 is Markov, 
the number of time steps n for which Y n G S x {T corr \ is either finite almost surely, 
or infinite almost surely. By Theorem [6] and Assumption [9j the expected value of 
each of the sojourn times of (Y n ) n > o in S x {T corr } is \[F V (X\ ^ S) < 1/8 < oo, 
so the sojourn times are finite almost surely. This means that either (T n ) n >o has 
infinitely many sojourns in S x {T corr } almost surely, or (Y n ) n > 0 has hnitely many 
sojourns in S x {T corr } almost surely. Thus: 


V S G <S, either 


lim ns t = oo = 1 or Pe 

T. im =yoo ’ " lm 1 


v n S,T si „. 

hm —-- 

T 3 i m =KX T s i m 


= 0 = 1 . 


Define r ^ = 0 and for * = 1,2,..., 

Tg- ) = inf{n >Tg : Y n G S x { T corr }, Y„_i G S x {T corr - 1}} 


(5.26) 


Ti) 


W 


(7 q — Ta T < 


Si- 1) 


Note that {cr^^}i=i, 2 ,... are iid and 

ns,T aim 

£ 4 ‘>< 


1 


T„ 


n S,T s , 


i-1 


n S,T eq 


< 


1 


n S,T airrl +1 


n S,T a , 


E 

i-1 


T w 

T s ■ 


If ns,T sim —► oo almost surely as T S i m —> oo, then by the strong law of large numbers 
there is a constant c! (depending on S) such that 


T ■ 

t M sim 

lim - 

oo ns,T sir} 


= c) = 1. 


(5.27) 


























24 


DAVID ARISTOFF 


From (5.26), (5.27) and the strong law of large numbers, there is a constant c such 
that 

. ns,T sirn 


■ 


lim 


T aim -> oo T Si 


E E °s!m = c = i, 


(5.28) 


ses j= o 


Theorem 21 along with (5.25) and (5.28), 


and due to (5.23) this c does not depend on the number of replicas N. By using 

d^ 28 t , 

E/(^)= f fd/i-c] =1. (5.29) 


£y 


lim 


1 


r sim ->-oo T„ 


i=0 


Now using (5.24), (5.28) and ( 5.29[ ), for any number N of replicas, 


lim [ fdn 1=1. 


Tsim-t OO T s , 


□ 
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